Add RTP/v3 interface design by JoTurk · Pull Request #362 · pion/rtp

JoTurk · 2026-05-25T05:17:18Z

Description

Note: that this is V3 because rtp/v2 was released by mistake a few years ago, This moves the library more towards a generic RTP library, this is backward compatible with v1 for users with custom packetizers/depacketizers, but we should drop this in v4.

Today the payloader interface is:

type Payloader interface {
    Payload(mtu uint16, payload []byte) [][]byte
}

and the generic webrtc-centeric packetizer builds RTP packets afterward. In the current interface, the market bit is hardcoded as "true on the last emitted payload fragment" which was true for many webrtc rtp payloads.

However, RTP defines the market bit as profile-specific not last fragment, RTP-MIDI is a clean example of the limitation of this interface #361 This RFC says the marker bit depends on the media type, for mpeg-genric for example it's always one, and this can't be expressed by "last payload wins".

When it comes to the depacketizer, the API has related problems:

type Depacketizer interface {
    Unmarshal(packet []byte) ([]byte, error)
    IsPartitionHead(payload []byte) bool
    IsPartitionTail(marker bool, payload []byte) bool
}

Unmarshal only receives the RTP payload, IsPartitionHead receives only payload bytes, and only IsPartitionTail gets the marker bit. The full RTP header, timestamp, sequence number, extensions, padding state, SSRC, and payload type are not available to the depacketizer methods.

Then, The samplebuilder then calls the depacketizer in a way that reinforces this limitation: it checks the tail with Marker and Payload, checks the head with only Payload, and appends the result of Unmarshal(payload) to the sample data.

This broke down for me when I was finishing av1 support because RTP can omit the length field, and had to make the interface stateful to deal with stuff like W !=0, and it wasn't not possible to honor the push / append model.

Many depacketizers, carry mutable depacketization state such as fuaBuffer in H264Packet, so a packet type is also acting as a stateful depacketizer.

this refactors the interface into

codec bitstream reader/writer
        ↓
RTP payload-format packetizer/depacketizer
        ↓
generic RTP packet sequencer/parser
        ↓
sample builder / jitter buffer / transport integration

which let's us define interfaces for codecs that have different input payloads e.g accepting avc and annex-b h264 without having hacks or random options, Also being able to write avc or annex-b as output without multiple transformation.

Reference issue

Fixes #69 and fixes #70

JoTurk · 2026-05-25T05:17:43Z

(I submitted without writing the description, writing it with edit ig.)

codecov · 2026-05-25T05:18:50Z

Codecov Report

❌ Patch coverage is 76.92308% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.58%. Comparing base (77ebb16) to head (f484a14).

Files with missing lines	Patch %	Lines
payload_format.go	76.92%	6 Missing and 3 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #362      +/-   ##
==========================================
- Coverage   82.64%   82.58%   -0.07%     
==========================================
  Files          28       29       +1     
  Lines        3440     3479      +39     
==========================================
+ Hits         2843     2873      +30     
- Misses        427      433       +6     
- Partials      170      173       +3

Flag	Coverage Δ
go	`82.58% <76.92%> (-0.07%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

JoTurk · 2026-05-25T05:30:31Z

@jech you didn't like the current RTP interface, what do you think about this one?

sirzooro · 2026-05-25T06:13:03Z

There are many things marked as deprecated in pion/rtp, it would be good to remove them during migration to v3.

JoTurk · 2026-05-25T06:14:50Z

@sirzooro thank you for the reminder, and yes this won't be the last PR, we should also move some stuff around.

jwetzell · 2026-05-25T13:15:55Z

+
+package rtp
+
+// MediaFormat describes the bitstream or sample representation passed to a payload format.


Not sure how much is applicable here, but I think whenever possible the RFC should be referenced for naming and rational behind particular concepts. Even if its just a comment with the RFC # + section.

This is mostly application logic but I'll try to reference all my reasons, and specs behind every interface.

jech · 2026-05-25T16:49:00Z

I need to think it over, but at first sight this looks good. Two questions.

Why do we need the PacketizerContext as an explicit parameter? The Packetizer is a stateful object, right, so why not make the context part of the Packetizer?

Why do we need the Flush method?

JoTurk · 2026-05-25T18:46:02Z

@jech yeah I agree it was the first iteration and I just added too much, going to remove context from the public input, the user only needs to configure de/packetizers once (e.g avc vs annexb for h264).
Also agree about flush it's not needed since the new interface owns all the logic.
Going to remove them both.

jech · 2026-05-27T11:37:02Z

***@***.**** commented on this pull request. So every *depacketizer* would have to deal with the concepts of samples and keyframes even if they don't make sense for that particular payload format (probably just by ignoring the fields)?

As I understand it, a sample corresponds to a sequence of rtp packets that have the same timestamp. This corresponds to a frame in Opus and VP8, and a Picture in VP9. I forget the AV1 term. A keyframe is just a sample at which you may start decoding a stream without any knowledge beyond what is in the SDP. The concept maps to every current codec (or else how would you join a conference in the middle?), although not always in an obvious manger (think of the parameter OBUs in AVC, for example). Joe, shall we rename sample to frame? Too bad for VP9, but I think they got the terminology wrong.

fippo · 2026-05-27T15:11:57Z

A keyframe is a frame with no dependencies to prior frames. https://medooze.medium.com/mastering-the-av1-svc-chains-a4b2a6a23925 is still a very good reference. RTP shouldn't know about this ideally.

JoTurk · 2026-05-27T16:14:36Z

@fippo In av1, RTP needs to know if it's a keyframe to set the N bit correctly, this is a limitation in Pion/RTP AV1 impl is that we set the N bit for all seq-headers and not only for key frames, because I didn't want to parse again in the packetizer

rtp/codecs/av1_packet_test.go

Lines 1447 to 1490 in 1edc72d

    
           { 
        
           	Name: "Sequence header should start a new packet", 
        
           	MTU:  1000, 
        
           	InputPayload: (testAV1MultiOBUsPayload{ 
        
           		{ 
        
           			Header: &obu.Header{ 
        
           				Type:         obu.OBUFrameHeader, 
        
           				HasSizeField: true, 
        
           			}, 
        
           			Payload: []byte{0x01, 0x02, 0x03, 0x04, 0x05}, 
        
           		}, 
        
           		{ 
        
           			Header: &obu.Header{ 
        
           				Type: obu.OBUSequenceHeader, 
        
           			}, 
        
           			Payload: []byte{0x06, 0x07, 0x08, 0x09, 0x0A}, 
        
           		}, 
        
           	}).Marshal(), 
        
           	OutputPayloads: [][]byte{ 
        
           		append( 
        
           			(testAV1AggregationHeader{ 
        
           				W: 1, 
        
           			}).Marshal(), 
        
           			(testAV1OBUPayload{ 
        
           				Header: &obu.Header{ 
        
           					Type: obu.OBUFrameHeader, 
        
           				}, 
        
           				Payload: []byte{0x01, 0x02, 0x03, 0x04, 0x05}, 
        
           			}).Marshal()..., 
        
           		), 
        
           		append( 
        
           			(testAV1AggregationHeader{ 
        
           				W: 1, 
        
           				N: true, 
        
           			}).Marshal(), 
        
           			(testAV1OBUPayload{ 
        
           				Header: &obu.Header{ 
        
           					Type: obu.OBUSequenceHeader, 
        
           				}, 
        
           				Payload: []byte{0x06, 0x07, 0x08, 0x09, 0x0A}, 
        
           			}).Marshal()..., 
        
           		), 
        
           	}, 
        
           },

Libwebrtc sets it correctly to only keyframes https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/modules/rtp_rtcp/source/rtp_packetizer_av1.cc;l=377?q=rtp_packetizer_av1.cc&ss=chromium%2Fchromium%2Fsrc

While it seems like setting the N flag for all seq headers doesn't break any impls, e.g chrome doesn't even construct full / real frames, because decoders don't require them. https://chromium.googlesource.com/external/webrtc/+/HEAD/modules/rtp_rtcp/source/video_rtp_depacketizer_av1.cc#387

But this change will make us correct Pion AV1 and other codecs impls that needs to be aware about "key frame".

A similar limitation is that we don't set the P flag at all in our hevc impl, and we should set it at the end of video seq, This redesign should make all of this possible without parsing again inside RTP (and pion/webrtc and in the user application sometimes e.g with the current design the user has to do avc to annex-b before passing it to pion!).

jech · 2026-05-27T16:34:40Z

A keyframe is a frame with no dependencies to prior frames. https://medooze.medium.com/mastering-the-av1-svc-chains-a4b2a6a23925 is still a very good reference. RTP shouldn't know about this ideally.

In addition to what Joe said, applications need to be able to identify a keyframe. For example, when a client joins a conference, it must not try to decode the stream until it sees a keyframe.

jwetzell · 2026-05-27T17:34:58Z

@JoTurk I think I'm piecing this together so the PacketInfo here would kind of in the realm of this in the chromium example?

sirzooro · 2026-05-31T20:44:57Z

+}
+
+// MediaSample is the media input passed to a payload-format packetizer.
+type MediaSample struct {


There is no timestamp field here. It would would be needed to implement extensions like ones from RFC 6051, or deal with gaps in media stream.

JoTurk requested review from Sean-Der, jech and jwetzell May 25, 2026 05:29

Add RTP/v3 interface design

f484a14

JoTurk force-pushed the jo/rtp-v3 branch from 811ab84 to f484a14 Compare May 25, 2026 05:35

JoTurk requested a review from at-wat May 25, 2026 05:44

JoTurk requested a review from sirzooro May 25, 2026 12:35

jwetzell reviewed May 25, 2026

View reviewed changes

sirzooro reviewed May 31, 2026

View reviewed changes


		package rtp

		// MediaFormat describes the bitstream or sample representation passed to a payload format.

Conversation

JoTurk commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Reference issue

Uh oh!

JoTurk commented May 25, 2026

Uh oh!

codecov Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

JoTurk commented May 25, 2026

Uh oh!

sirzooro commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JoTurk commented May 25, 2026

Uh oh!

jwetzell May 25, 2026

Choose a reason for hiding this comment

Uh oh!

JoTurk May 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jech commented May 25, 2026

Uh oh!

JoTurk commented May 25, 2026

Uh oh!

jech commented May 27, 2026 via email

Uh oh!

fippo commented May 27, 2026

Uh oh!

JoTurk commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jech commented May 27, 2026

Uh oh!

jwetzell commented May 27, 2026

Uh oh!

sirzooro May 31, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

JoTurk commented May 25, 2026 •

edited

Loading

codecov Bot commented May 25, 2026 •

edited

Loading

sirzooro commented May 25, 2026 •

edited

Loading

JoTurk commented May 27, 2026 •

edited

Loading